Tugas Akhir PSD UAS

Angga Fathan Rofiqy

26 December, 2023

Kode di Hide dalam default, untuk menampilkan kode, klik Code.

#                      -=( Install & Load Package Function )=-
install_load <- function (package1, ...)  {   

   # convert arguments to vector
   packages <- c(package1, ...)

   # start loop to determine if each package is installed
   for(package in packages){

       # if package is installed locally, load
       if(package %in% rownames(installed.packages()))
          do.call('library', list(package))

       # if package is not installed locally, download, then load
       else {
          install.packages(package)
          do.call("library", list(package))
       }
   } 
}
path <- function(){
  gsub  ( "\\\\",  "/",  readClipboard ()  )
}
#Copy path, Panggil function di console
#Copy r path, paste ke var yang diinginkan
#Export chart
export.chart <- "C:/Users/Fathan/Documents/Obsidian Vault/2. Kuliah/Smt 5/8. Pengantar Sains Data/Tugas/Tugas Akhir/Chart"

1 Pendahuluan

1.1 Kelompok 3

Nama NIM
Angga Fathan Rofiqy G1401211006
Muhamad Farras Surya Dio Putra G1401211018
Salsabila Dwi Rahmi G1401211026
Dhiya Khalishah Tsany Suwarso G1401211038

1.2 Latar Belakang

Bencana alam adalah suatu peristiwa alam yang dapat mengakibatkan dampak besar bagi populasi manusia. Menurut UCLouvain (2019), Indonesia salah satu negara dengan jmlah intensitas bencana alam terbanyak di dunia setelah Amerika Serikat.

Dampak negatif yang ditimbulkan menurut Chaudhary dan Piracha (2021). Diantaranya :

  • Kelangkaan bahan pangan

  • Trauma pasca bencana

  • Terjadinya migrasi secara besar-besaran

  • Masalah finansial dan ekonomi

  • Perasaan khawatir akan kehidupan selanjutnya

Penanggulangan bencana alam atau mitigasi adalah upaya berkelanjutan untuk mengurangi dampak bencana terhadap manusia dan harta benda.

1.3 Tujuan

Penelitian ini bertujuan mengelompokan provinsi di Indonesia menurut data intensitas bencana alam tahun 2018-2021 menggunakan metode Clustering.

Manfaat bagi Pemerintah

Hasil dari analisis ini diharapkan dapat menjadi referensi ataupun pedoman bagi pemerintah pusat maupun pemerintah daerah agar lebih fokus dalam merancang langkah-langkah yang harus diambil untuk mencegah atau menanggulangi dampak suatu bencana yang akan terjadi

Manfaat bagi Masyarakat

Hasil dari analisis ini diharapkan mampu dimanfaatkan oleh masyarakat Indonesia agar lebih mempersiapkan diri dan mempelajari cara penanggulangan bencana alam berdasarkan jumlah intensitas bencana alam yang tejadi di Provinsi Indonesia.

1.4 Metode Clustering

1.4.1 Hierarchical

Langkah-langkah untuk melakukan hierarchical cluster analysis:

  1. Menyiapkan data dimana data yang digunakan adalah data bertipe numerik agar dapat digunakan untuk penghitungan jarak.
  2. Menghitung (dis)similarity atau jarak antar data yang berpasangan pada dataset. Metode penghitungan (dis)similarity dapat dipilih berdasarkan data. Nilai (dis)similarity tersebut kemudian akan disusun menjadi distance matrix.
  3. Membuat dendrogram dari distance matrix menggunakan linkage method tertentu. Kita juga dapat mencoba beberapa linkage method kemudian memilih dedrogram paling baik.
  4. Menentukan dimana akan melakukan pemotongan tree (dengan nilai (dis)similarity tertentu). Disinilah tahap dimana cluster akan terbentuk.
  5. Melakukan interpretasi dari dendrogram yang telah didapat.

1.4.2 K-Means

1.4.3 Fuzzy C-Means

Fuzzy C-Means (FCM) adalah algoritma pengelompokan lunak yang diusulkan oleh Bezdek (1974; 1981). Berbeda dengan algoritma K-means di mana setiap objek data adalah anggota hanya satu kelompok, objek data adalah anggota dari semua kelompok dengan derajat keanggotaan yang bervariasi antara 0 dan 1 dalam FCM. Oleh karena itu, objek data yang lebih dekat ke pusat kelompok memiliki derajat keanggotaan yang lebih tinggi daripada objek yang tersebar di batas kelompok.

1.4.4 Gaussian Mixture Model (GMM)

Metode ini mengasumsikan bahwa keseluruhan individu adalah campuran dari sebaran peluang Gaussian, mewakili distribusi Gaussian dimana masing masing sebaran secara khas mempunyai parameter distribusi. Algoritma Expectation Maximization adalah salah satu alternatif algoritma yang banyak digunakan dalam melakukan pemodelan mixture.

1.5 Data

Data yang digunakan adalah data sekunder yang berasal dari situs www.bps.go.id berupa Banyaknya Desa/Kelurahan Menurut Jenis Bencana Alam dalam Tiga Tahun Terakhir (Desa), 2021. Data terdiri dari 34 amatan berupa provinsi yang ada di Indonesia.

Selain itu, terdapat data jumlah desa menurut provinsi di Indonesia pada tahun 2021 yang bersumber dari www.bps.go.id. Dilakukan standarisasi dengan membuat persentase antara jumlah desa yang terkena bencana alam dengan jumlah seluruh desa yang ada di tiap provinsi.

Peubah yang digunakan

Peubah Sebagai Peubah Keterangan Tipe Peubah
Gada X1 Tidak Ada Bencana Alam Numerik
KG X2 Kekeringan Numerik
KH X3 Kebakaran Hutan Numerik
GM X4 Gunung Meletus Numerik
APB X5 Angin Puyuh / Angin Puting Beliung / Topan Numerik
GPL X6 Gelombang Pasang Laut Numerik
TSN X7 Tsunami Numerik
GB X8 Gempa Bumi Numerik
BB X9 Banjir Bandang Numerik
BJR X10 Banjir Numerik
TL X11 Tanah Longsor Numerik

1.5.1 Data Entry

install_load('rio')
raw.data1 <- import("https://raw.githubusercontent.com/Zen-Rofiqy/STA1381-PSD/main/Tugas/Tugas%20Akhir/Data%20PSD.csv")

raw.data2 <- import("https://raw.githubusercontent.com/Zen-Rofiqy/STA1381-PSD/main/Tugas/Tugas%20Akhir/Data%20PSD_Perc.csv")

raw.data3 <- import("https://raw.githubusercontent.com/Zen-Rofiqy/STA1381-PSD/main/Tugas/Tugas%20Akhir/Data%20PSD_Desa.csv")

1.5.2 Data Checking

Mengecek Tipe data

str(raw.data1)
## 'data.frame':    35 obs. of  12 variables:
##  $ Provinsi: chr  "ACEH" "SUMATERA UTARA" "SUMATERA BARAT" "RIAU" ...
##  $ Gada    : int  4406 3827 513 1224 1001 2644 1195 2093 265 241 ...
##  $ KG      : int  173 127 43 51 16 98 19 30 0 27 ...
##  $ KH      : int  43 59 18 194 16 64 4 11 14 57 ...
##  $ GM      : int  1 82 0 0 0 0 0 0 0 0 ...
##  $ APB     : int  108 483 248 53 44 90 17 158 77 40 ...
##  $ GPL     : int  106 78 57 15 2 3 14 35 17 69 ...
##  $ TSN     : int  2 4 0 0 0 0 0 0 0 0 ...
##  $ GB      : int  493 964 364 0 36 49 66 47 0 0 ...
##  $ BJR     : int  1435 732 342 455 476 380 171 328 59 61 ...
##  $ BB      : int  81 52 65 1 17 36 15 23 0 1 ...
##  $ TL      : int  198 483 222 21 57 103 81 70 1 25 ...
str(raw.data2)
## 'data.frame':    35 obs. of  12 variables:
##  $ Provinsi: chr  "ACEH" "SUMATERA UTARA" "SUMATERA BARAT" "RIAU" ...
##  $ Gada    : num  67.6 62.4 44.3 65.2 64.1 ...
##  $ KG      : num  2.66 2.07 3.71 2.72 1.02 2.98 1.25 1.13 0 6.47 ...
##  $ KH      : num  0.66 0.96 1.55 10.34 1.02 ...
##  $ GM      : num  0.02 1.34 0 0 0 0 0 0 0 0 ...
##  $ APB     : num  1.66 7.88 21.4 2.83 2.82 ...
##  $ GPL     : num  1.63 1.27 4.92 0.8 0.13 ...
##  $ TSN     : num  0.03 0.07 0 0 0 0 0 0 0 0 ...
##  $ GB      : num  7.57 15.72 31.41 0 2.3 ...
##  $ BJR     : num  1.24 0.85 5.61 0.05 1.09 1.09 0.99 0.87 0 0.24 ...
##  $ BB      : num  22 11.9 29.5 24.2 30.5 ...
##  $ TL      : num  3.04 7.88 19.15 1.12 3.65 ...
str(raw.data3)
## 'data.frame':    35 obs. of  3 variables:
##  $ Provinsi   : chr  "Aceh" "Sumatera Utara" "Sumatera Barat" "Riau" ...
##  $ Jumlah Desa: int  6516 6132 1159 1876 1562 3289 1514 2654 393 417 ...
##  $ KODE       : int  11 12 13 14 15 16 17 18 19 21 ...

Semua tipe data sudah sesuai.

Mengecek Data kosong

sum(is.na(raw.data1))
## [1] 0
sum(is.na(raw.data2))
## [1] 0
sum(is.na(raw.data3))
## [1] 0

Tidak ada data kosong.

1.5.3 Frekuensi

install_load("DT")
datatable(raw.data1, filter = 'top', 
          options = list(pageLength = 10))

1.5.4 Persentase

datatable(raw.data2, filter = 'top', 
          options = list(pageLength = 10))

1.5.5 Desa

datatable(raw.data3, filter = 'top', 
          options = list(pageLength = 10))

1.6 Library

install_load("ppclust", "factoextra", "dplyr", "cluster", "fclust", "psych", 
             "FactoMineR", "ggplot2", "fmsb")

2 Eksplorasi

2.1 Korelasi Antar Peubah

dtx <- raw.data2[-35,-1]
pairs.panels(dtx, method = "pearson", stars=TRUE)

Bisa dilihat bahwa ada beberapa peubah yang memiliki nilai korelasi linier yang signifikan. Sebaran tiap peubahnya juga cenderung menjulur ke kanan, artinya nilai yang lebih tinggi dari setiap peubah cenderung lebih sering muncul di bagian kanan grafik. Ini menunjukkan adanya kecenderungan bahwa beberapa nilai ekstrem yang lebih tinggi mungkin mempengaruhi data, sehingga grafik cenderung menjulur ke kanan.

2.2 Pencilan

boxplot(dtx)

Sebagaimana yang sudah dijelaskan sebelumnya, bahwa data memiliki nilai ekstrim atau pencilan.

3 Clustering

3.1 GMM

library("mclust")
mod1 = Mclust(dtx)
mod1$BIC
## Bayesian Information Criterion (BIC): 
##         EII       VII       EEI       VEI       EVI       VVI       EEE
## 1 -2662.641 -2662.641 -1844.747 -1844.747 -1844.747 -1844.747 -1769.896
## 2 -2619.910 -2572.704 -1836.728 -1758.811        NA        NA -1790.443
## 3 -2505.533 -2471.380 -1841.345 -1686.899        NA        NA -1788.481
## 4 -2464.491 -2358.342 -1853.884 -1679.718        NA        NA -1814.731
## 5 -2428.799 -2371.459 -1821.136 -1666.039        NA        NA -1786.245
## 6 -2398.412        NA -1785.704        NA        NA        NA -1768.457
## 7 -2422.417        NA -1788.439        NA        NA        NA -1792.576
## 8 -2435.464        NA -1781.657        NA        NA        NA -1793.891
## 9 -2422.871        NA -1774.321        NA        NA        NA -1787.903
##         VEE       EVE       VVE       EEV       VEV       EVV       VVV
## 1 -1769.896 -1769.896 -1769.896 -1769.896 -1769.896 -1769.896 -1769.896
## 2        NA        NA        NA -1967.937 -1801.372        NA        NA
## 3        NA        NA        NA        NA -1774.121        NA        NA
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
## 7        NA        NA        NA        NA        NA        NA        NA
## 8        NA        NA        NA        NA        NA        NA        NA
## 9        NA        NA        NA        NA        NA        NA        NA
## 
## Top 3 models based on the BIC criterion: 
##     VEI,5     VEI,4     VEI,3 
## -1666.039 -1679.718 -1686.899

3.1.1 Jumlah Cluster

plot(mod1, what = 'BIC')

mod1b = Mclust(dtx, G = 5, modelNames = c("VEI"))
summary(mod1b, parameters = TRUE)
## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VEI (diagonal, equal shape) model with 5 components: 
## 
##  log-likelihood  n df       BIC       ICL
##       -702.5442 34 74 -1666.039 -1666.041
## 
## Clustering table:
##  1  2  3  4  5 
## 15  2  4  6  7 
## 
## Mixing probabilities:
##          1          2          3          4          5 
## 0.44120997 0.05882351 0.11761936 0.17647041 0.20587676 
## 
## Means:
##              [,1]       [,2]         [,3]         [,4]         [,5]
## Gada 52.499481629 58.6000015 6.315995e+01 8.090834e+01 5.509838e+01
## KG    4.833780575  1.9550000 1.699916e+00 1.076667e+00 1.848583e+00
## KH    3.049179029  0.7750001 9.425229e-01 6.649999e-01 3.228608e+00
## GM    0.159328371  1.1050001 7.499107e-02 1.745952e-19 3.316218e-19
## APB  10.048409835  6.8000004 7.267441e+00 2.468335e+00 1.847121e+00
## GPL   5.673073961  1.1450000 3.170129e+00 1.230000e+00 1.141446e+00
## TSN   0.005999547  0.0600000 1.290277e-15 6.285241e-29 3.516587e-19
## GB   17.454556575 22.2749975 1.006292e+01 4.418326e+00 9.971200e-01
## BJR   1.991941180  0.9899999 1.305008e+00 5.900004e-01 6.728486e-01
## BB   20.964429152 12.8899996 1.400227e+01 9.368336e+00 3.988306e+01
## TL    9.450756470  6.9450004 1.216003e+01 3.226666e+00 4.344297e+00
## 
## Variances:
## [,,1]
##         Gada       KG       KH        GM     APB    GPL          TSN       GB
## Gada 302.967  0.00000  0.00000 0.0000000  0.0000  0.000 0.0000000000   0.0000
## KG     0.000 13.65424  0.00000 0.0000000  0.0000  0.000 0.0000000000   0.0000
## KH     0.000  0.00000 18.08222 0.0000000  0.0000  0.000 0.0000000000   0.0000
## GM     0.000  0.00000  0.00000 0.1303904  0.0000  0.000 0.0000000000   0.0000
## APB    0.000  0.00000  0.00000 0.0000000 64.9537  0.000 0.0000000000   0.0000
## GPL    0.000  0.00000  0.00000 0.0000000  0.0000 15.918 0.0000000000   0.0000
## TSN    0.000  0.00000  0.00000 0.0000000  0.0000  0.000 0.0001692191   0.0000
## GB     0.000  0.00000  0.00000 0.0000000  0.0000  0.000 0.0000000000 373.9283
## BJR    0.000  0.00000  0.00000 0.0000000  0.0000  0.000 0.0000000000   0.0000
## BB     0.000  0.00000  0.00000 0.0000000  0.0000  0.000 0.0000000000   0.0000
## TL     0.000  0.00000  0.00000 0.0000000  0.0000  0.000 0.0000000000   0.0000
##           BJR       BB       TL
## Gada 0.000000   0.0000  0.00000
## KG   0.000000   0.0000  0.00000
## KH   0.000000   0.0000  0.00000
## GM   0.000000   0.0000  0.00000
## APB  0.000000   0.0000  0.00000
## GPL  0.000000   0.0000  0.00000
## TSN  0.000000   0.0000  0.00000
## GB   0.000000   0.0000  0.00000
## BJR  3.322503   0.0000  0.00000
## BB   0.000000 184.0492  0.00000
## TL   0.000000   0.0000 52.02535
## [,,2]
##          Gada       KG       KH         GM     APB      GPL          TSN
## Gada 33.78676 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## KG    0.00000 1.522716 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## KH    0.00000 0.000000 2.016522 0.00000000 0.00000 0.000000 0.000000e+00
## GM    0.00000 0.000000 0.000000 0.01454108 0.00000 0.000000 0.000000e+00
## APB   0.00000 0.000000 0.000000 0.00000000 7.24361 0.000000 0.000000e+00
## GPL   0.00000 0.000000 0.000000 0.00000000 0.00000 1.775169 0.000000e+00
## TSN   0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 1.887124e-05
## GB    0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## BJR   0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## BB    0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## TL    0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
##            GB       BJR       BB       TL
## Gada  0.00000 0.0000000  0.00000 0.000000
## KG    0.00000 0.0000000  0.00000 0.000000
## KH    0.00000 0.0000000  0.00000 0.000000
## GM    0.00000 0.0000000  0.00000 0.000000
## APB   0.00000 0.0000000  0.00000 0.000000
## GPL   0.00000 0.0000000  0.00000 0.000000
## TSN   0.00000 0.0000000  0.00000 0.000000
## GB   41.70033 0.0000000  0.00000 0.000000
## BJR   0.00000 0.3705242  0.00000 0.000000
## BB    0.00000 0.0000000 20.52509 0.000000
## TL    0.00000 0.0000000  0.00000 5.801846
## [,,3]
##          Gada       KG       KH         GM      APB      GPL         TSN
## Gada 26.81707 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## KG    0.00000 1.208603 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## KH    0.00000 0.000000 1.600545 0.00000000 0.000000 0.000000 0.00000e+00
## GM    0.00000 0.000000 0.000000 0.01154148 0.000000 0.000000 0.00000e+00
## APB   0.00000 0.000000 0.000000 0.00000000 5.749366 0.000000 0.00000e+00
## GPL   0.00000 0.000000 0.000000 0.00000000 0.000000 1.408979 0.00000e+00
## TSN   0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 1.49784e-05
## GB    0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## BJR   0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## BB    0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## TL    0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
##           GB       BJR       BB       TL
## Gada  0.0000 0.0000000  0.00000 0.000000
## KG    0.0000 0.0000000  0.00000 0.000000
## KH    0.0000 0.0000000  0.00000 0.000000
## GM    0.0000 0.0000000  0.00000 0.000000
## APB   0.0000 0.0000000  0.00000 0.000000
## GPL   0.0000 0.0000000  0.00000 0.000000
## TSN   0.0000 0.0000000  0.00000 0.000000
## GB   33.0982 0.0000000  0.00000 0.000000
## BJR   0.0000 0.2940908  0.00000 0.000000
## BB    0.0000 0.0000000 16.29108 0.000000
## TL    0.0000 0.0000000  0.00000 4.605015
## [,,4]
##          Gada        KG        KH       GM      APB       GPL          TSN
## Gada 13.50905 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## KG    0.00000 0.6088314 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## KH    0.00000 0.0000000 0.8062713 0.000000 0.000000 0.0000000 0.000000e+00
## GM    0.00000 0.0000000 0.0000000 0.005814 0.000000 0.0000000 0.000000e+00
## APB   0.00000 0.0000000 0.0000000 0.000000 2.896232 0.0000000 0.000000e+00
## GPL   0.00000 0.0000000 0.0000000 0.000000 0.000000 0.7097707 0.000000e+00
## TSN   0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 7.545339e-06
## GB    0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## BJR   0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## BB    0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## TL    0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
##            GB       BJR       BB       TL
## Gada  0.00000 0.0000000 0.000000 0.000000
## KG    0.00000 0.0000000 0.000000 0.000000
## KH    0.00000 0.0000000 0.000000 0.000000
## GM    0.00000 0.0000000 0.000000 0.000000
## APB   0.00000 0.0000000 0.000000 0.000000
## GPL   0.00000 0.0000000 0.000000 0.000000
## TSN   0.00000 0.0000000 0.000000 0.000000
## GB   16.67316 0.0000000 0.000000 0.000000
## BJR   0.00000 0.1481477 0.000000 0.000000
## BB    0.00000 0.0000000 8.206602 0.000000
## TL    0.00000 0.0000000 0.000000 2.319767
## [,,5]
##          Gada       KG       KH          GM      APB      GPL          TSN
## Gada 22.86682 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## KG    0.00000 1.030571 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## KH    0.00000 0.000000 1.364778 0.000000000 0.000000 0.000000 0.000000e+00
## GM    0.00000 0.000000 0.000000 0.009841379 0.000000 0.000000 0.000000e+00
## APB   0.00000 0.000000 0.000000 0.000000000 4.902462 0.000000 0.000000e+00
## GPL   0.00000 0.000000 0.000000 0.000000000 0.000000 1.201431 0.000000e+00
## TSN   0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 1.277202e-05
## GB    0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## BJR   0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## BB    0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## TL    0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
##            GB       BJR       BB       TL
## Gada  0.00000 0.0000000  0.00000 0.000000
## KG    0.00000 0.0000000  0.00000 0.000000
## KH    0.00000 0.0000000  0.00000 0.000000
## GM    0.00000 0.0000000  0.00000 0.000000
## APB   0.00000 0.0000000  0.00000 0.000000
## GPL   0.00000 0.0000000  0.00000 0.000000
## TSN   0.00000 0.0000000  0.00000 0.000000
## GB   28.22271 0.0000000  0.00000 0.000000
## BJR   0.00000 0.2507701  0.00000 0.000000
## BB    0.00000 0.0000000 13.89134 0.000000
## TL    0.00000 0.0000000  0.00000 3.926679
library(dplyr)

# Menghitung frekuensi observasi di setiap kluster
cluster_frequencies <- table(mod1b$classification)

# Mengurutkan kluster berdasarkan frekuensinya
sorted_clusters <- names(sort(cluster_frequencies, decreasing = TRUE))

# Membuat urutan kluster yang diinginkan (1, 2, 3, 4, 5)
new_order <- 1:length(sorted_clusters)

# Menukar isi kluster dengan urutan yang dihasilkan
mod1b$classification <- recode(mod1b$classification, !!!setNames(as.character(new_order), sorted_clusters))

table(mod1b$classification)
## 
##  1  2  3  4  5 
## 15  7  6  4  2

3.1.2 Plot Cluster

library(factoextra)
fviz_cluster(mod1b, data = dtx, repel = TRUE, labelsize =8)

3.1.3 Profil

data.clust1 <- cbind(dtx, Cluster = mod1b[["classification"]])

# Calculate the mean of each variable for each cluster
cluster_profiles1 <- aggregate(. ~ Cluster, data.clust1, mean)

# Print the cluster profiles
print(cluster_profiles1)
##   Cluster     Gada       KG       KH        GM       APB      GPL   TSN
## 1       1 52.49867 4.834000 3.049333 0.1593333 10.048667 5.673333 0.006
## 2       2 55.09857 1.848571 3.228571 0.0000000  1.847143 1.141429 0.000
## 3       3 80.90833 1.076667 0.665000 0.0000000  2.468333 1.230000 0.000
## 4       4 63.16000 1.700000 0.942500 0.0750000  7.267500 3.170000 0.000
## 5       5 58.60000 1.955000 0.775000 1.1050000  6.800000 1.145000 0.060
##           GB       BJR        BB        TL
## 1 17.4553333 1.9920000 20.964667  9.450667
## 2  0.9971429 0.6728571 39.882857  4.344286
## 3  4.4183333 0.5900000  9.368333  3.226667
## 4 10.0625000 1.3050000 14.002500 12.160000
## 5 22.2750000 0.9900000 12.890000  6.945000
# Convert the data to long format for plotting
cluster_profiles_long1 <- tidyr::pivot_longer(cluster_profiles1, -Cluster, 
                                             names_to = "Variable", values_to = "Value")

# Create the bar plot
ggplot(cluster_profiles_long1, aes(x = Cluster, y = Value, fill = Variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(x = "Cluster", y = "Mean Value", fill = "Variable") +
  theme_minimal() +
  ggtitle("Cluster Profiles")

library(ggiraphExtra)

data.akhir1 <- cbind(raw.data2[-35,], Cluster =  mod1b[["classification"]]) %>% 
  relocate(Cluster, .before = 2)

# Radar Plot
ggRadar(
  data = data.akhir1,
  mapping = aes(colours = Cluster)
) + 
theme_light() +
theme(
  text = element_text(size = 10),  # Mengubah ukuran font global
  title = element_text(size = 12),  # Mengubah ukuran font judul
  axis.text = element_text(size = 10),  # Mengubah ukuran font label sumbu
  legend.text = element_text(size = 8)  # Mengubah ukuran font legenda
)

3.1.4 Map

install_load("spdep","rgdal")
indo <- st_read(dsn= paste0(wd,"/SHP Indonesia/prov.shp"), 
                quiet = TRUE)
data.map <- cbind(data.clust1, KODE=raw.data3$KODE[-35])
  
data.indo <- indo %>%
  inner_join(data.map, by = c("KODE" = "KODE"))
ggplot() +  
  geom_sf(data=data.indo, aes(fill=factor(`Cluster`))) +
  scale_fill_manual(values=c("1" = "indianred", "2" = "lightgreen", "3" = "dodgerblue",
                             "4"="cyan3", "5"="purple3"), 
                    name = "Keterangan") +
  labs(title = "Cluster Bencana Alam \n pada Provinsi Indonesia 2021",
       x = "Longitude",
       y = "Latitude") +
  theme_minimal() +
  theme(legend.text = element_text(size=10),
        legend.title = element_text(size=10, face="bold"),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        plot.title = element_text(size=12, face="bold", hjust = 0.5)) +
  scale_x_continuous(labels = function(x) paste0(x, "°")) +
  scale_y_continuous(labels = function(y) paste0(y, "°"))

3.1.5 Eksport Data Cluster

#Export Data
install_load('openxlsx')
#Model Tentatif 
write.xlsx(list("GMM" = data.akhir1), 
           file = "Data_Cluster.xlsx")

3.2 Jumlah Cluster

#calculate gap statistic for each number of clusters (up to 10 clusters)
gap_stat <- clusGap(dtx, FUN = hcut, nstart = 25, K.max = 10, B = 50)

#produce plot of clusters vs. gap statistic
fviz_gap_stat(gap_stat)

## Koefisien silhoutte dan Elbow

fviz_nbclust(dtx, kmeans, method = "silhouette") #silhouette, k=3

fviz_nbclust(dtx, kmeans, method = "wss") #wss, k=1

fviz_nbclust(x = dtx, FUNcluster = kmeans, method = "gap_stat") #ga_stat k=1

library(NbClust)
library("factoextra")
nb <- NbClust(data = dtx, distance = "euclidean", method="kmeans")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 5 proposed 2 as the best number of clusters 
## * 6 proposed 3 as the best number of clusters 
## * 4 proposed 4 as the best number of clusters 
## * 2 proposed 5 as the best number of clusters 
## * 1 proposed 12 as the best number of clusters 
## * 3 proposed 13 as the best number of clusters 
## * 1 proposed 14 as the best number of clusters 
## * 1 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************
fviz_nbclust(nb)
## Error in if (class(best_nc) == "numeric") print(best_nc) else if (class(best_nc) == : the condition has length > 1

3.3 Fuzzy

res.fcm <- fcm(dtx, centers=3)
as.data.frame(res.fcm$u)[1:6,]
##   Cluster 1 Cluster 2  Cluster 3
## 1 0.2467255 0.7281924 0.02508206
## 2 0.1461973 0.8098839 0.04391880
## 3 0.3019321 0.2170957 0.48097216
## 4 0.4832460 0.4792199 0.03753414
## 5 0.7372493 0.2379269 0.02482383
## 6 0.1431290 0.8231577 0.03371328

3.3.1 Matriks prototipe awal dan akhir klaster

res.fcm$v0
##            Gada   KG   KH GM   APB  GPL TSN    GB  BJR    BB    TL
## Cluster 1 56.93 0.00 0.37  0  0.00 1.87   0  0.37 0.00 40.82  2.62
## Cluster 2 67.43 0.00 3.56  0 19.59 4.33   0  0.00 0.00 15.01  0.25
## Cluster 3 44.26 3.71 1.55  0 21.40 4.92   0 31.41 5.61 29.51 19.15
res.fcm$v
##               Gada       KG       KH         GM      APB      GPL         TSN
## Cluster 1 55.18355 3.134711 3.071610 0.05494674 4.228712 2.255503 0.004726236
## Cluster 2 68.89251 2.482674 1.664947 0.15613349 6.220171 3.243801 0.006948835
## Cluster 3 33.03813 2.531644 1.422857 0.11012486 6.770252 7.358808 0.001777819
##                  GB      BJR       BB        TL
## Cluster 1  4.890740 1.201827 34.09940  6.124082
## Cluster 2  7.439088 1.083219 13.37960  6.376892
## Cluster 3 51.059097 2.014997 21.96576 13.126262
summary(res.fcm)
## Summary for 'res.fcm'
## 
## Number of data objects:  34 
## 
## Number of clusters:  3 
## 
## Crisp clustering vector:
##  [1] 2 2 3 1 1 2 2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 1 1 1 2 1 2 2 1 3 2 3 2 2
## 
## Initial cluster prototypes:
##            Gada   KG   KH GM   APB  GPL TSN    GB  BJR    BB    TL
## Cluster 1 56.93 0.00 0.37  0  0.00 1.87   0  0.37 0.00 40.82  2.62
## Cluster 2 67.43 0.00 3.56  0 19.59 4.33   0  0.00 0.00 15.01  0.25
## Cluster 3 44.26 3.71 1.55  0 21.40 4.92   0 31.41 5.61 29.51 19.15
## 
## Final cluster prototypes:
##               Gada       KG       KH         GM      APB      GPL         TSN
## Cluster 1 55.18355 3.134711 3.071610 0.05494674 4.228712 2.255503 0.004726236
## Cluster 2 68.89251 2.482674 1.664947 0.15613349 6.220171 3.243801 0.006948835
## Cluster 3 33.03813 2.531644 1.422857 0.11012486 6.770252 7.358808 0.001777819
##                  GB      BJR       BB        TL
## Cluster 1  4.890740 1.201827 34.09940  6.124082
## Cluster 2  7.439088 1.083219 13.37960  6.376892
## Cluster 3 51.059097 2.014997 21.96576 13.126262
## 
## Distance between the final cluster prototypes
##           Cluster 1 Cluster 2
## Cluster 2  631.1748          
## Cluster 3 2854.4422 3325.6859
## 
## Difference between the initial and final cluster prototypes
##                 Gada        KG         KH         GM        APB        GPL
## Cluster 1  -1.746453  3.134711  2.7016103 0.05494674   4.228712  0.3855031
## Cluster 2   1.462514  2.482674 -1.8950533 0.15613349 -13.369829 -1.0861992
## Cluster 3 -11.221875 -1.178356 -0.1271435 0.11012486 -14.629748  2.4388080
##                   TSN        GB       BJR        BB        TL
## Cluster 1 0.004726236  4.520740  1.201827 -6.720602  3.504082
## Cluster 2 0.006948835  7.439088  1.083219 -1.630404  6.126892
## Cluster 3 0.001777819 19.649097 -3.595003 -7.544241 -6.023738
## 
## Root Mean Squared Deviations (RMSD): 20.37673 
## Mean Absolute Deviation (MAD): 482.0302 
## 
## Membership degrees matrix (top and bottom 5 rows): 
##   Cluster 1 Cluster 2  Cluster 3
## 1 0.2467255 0.7281924 0.02508206
## 2 0.1461973 0.8098839 0.04391880
## 3 0.3019321 0.2170957 0.48097216
## 4 0.4832460 0.4792199 0.03753414
## 5 0.7372493 0.2379269 0.02482383
## ...
##     Cluster 1 Cluster 2  Cluster 3
## 30 0.11375796 0.1036085 0.78263353
## 31 0.15646524 0.8071614 0.03637335
## 32 0.02702304 0.0229784 0.94999856
## 33 0.12893606 0.8220949 0.04896906
## 34 0.22157498 0.6982273 0.08019770
## 
## Descriptive statistics for the membership degrees by clusters
##           Size       Min        Q1      Mean    Median        Q3       Max
## Cluster 1   13 0.3978128 0.4832460 0.7121100 0.8004335 0.8709146 0.9022756
## Cluster 2   18 0.4433379 0.6685871 0.7360856 0.8085227 0.8312259 0.8553647
## Cluster 3    3 0.4809722 0.6318028 0.7378681 0.7826335 0.8663160 0.9499986
## 
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized 
##  0.6156167  0.4234250 
## 
## Within cluster sum of squares by cluster:
##        1        2        3 
## 4289.080 4747.395 1882.548 
## (between_SS / total_SS =  54.16%) 
## 
## Available components: 
##  [1] "u"          "v"          "v0"         "d"          "x"         
##  [6] "cluster"    "csize"      "sumsqrs"    "k"          "m"         
## [11] "iter"       "best.start" "func.val"   "comp.time"  "inpargs"   
## [16] "algorithm"  "call"

3.3.2 Run FCM with Multiple Starts

res.fcm <- fcm(dtx, centers=3, nstart=5)

res.fcm$func.val
## [1] 6826.256 6826.256 6826.256 6826.256 6826.256
res.fcm$iter
## [1] 151 127 153 135 133
res.fcm$best.start
## [1] 1
summary(res.fcm)
## Summary for 'res.fcm'
## 
## Number of data objects:  34 
## 
## Number of clusters:  3 
## 
## Crisp clustering vector:
##  [1] 2 2 3 1 1 2 2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 1 1 1 2 1 2 2 1 3 2 3 2 2
## 
## Initial cluster prototypes:
##            Gada   KG   KH GM  APB  GPL  TSN    GB  BJR    BB    TL
## Cluster 1 58.42 1.30 1.36  0 5.82 5.16 0.00 18.97 2.01 15.65 11.85
## Cluster 2 16.15 3.38 1.23  0 4.00 5.38 0.00 76.62 1.08 19.23 22.15
## Cluster 3 58.89 4.96 0.52  0 6.89 2.51 0.06 10.18 2.64 27.00  6.19
## 
## Final cluster prototypes:
##               Gada       KG       KH         GM      APB      GPL         TSN
## Cluster 1 55.18355 3.134711 3.071610 0.05494674 4.228712 2.255503 0.004726236
## Cluster 2 68.89251 2.482674 1.664947 0.15613349 6.220171 3.243801 0.006948835
## Cluster 3 33.03813 2.531644 1.422857 0.11012486 6.770252 7.358808 0.001777819
##                  GB      BJR       BB        TL
## Cluster 1  4.890740 1.201827 34.09940  6.124082
## Cluster 2  7.439088 1.083219 13.37960  6.376892
## Cluster 3 51.059097 2.014997 21.96576 13.126262
## 
## Distance between the final cluster prototypes
##           Cluster 1 Cluster 2
## Cluster 2  631.1748          
## Cluster 3 2854.4422 3325.6859
## 
## Difference between the initial and final cluster prototypes
##                 Gada        KG        KH         GM        APB       GPL
## Cluster 1  -3.236453  1.834711 1.7116103 0.05494674 -1.5912880 -2.904497
## Cluster 2  52.742514 -0.897326 0.4349467 0.15613349  2.2201706 -2.136199
## Cluster 3 -25.851875 -2.428356 0.9028565 0.11012486 -0.1197479  4.848808
##                    TSN        GB          BJR        BB         TL
## Cluster 1  0.004726236 -14.07926 -0.808173316 18.449398  -5.725918
## Cluster 2  0.006948835 -69.18091  0.003219146 -5.850404 -15.773108
## Cluster 3 -0.058222181  40.87910 -0.625002729 -5.034241   6.936262
## 
## Root Mean Squared Deviations (RMSD): 60.28987 
## Mean Absolute Deviation (MAD): 1054.524 
## 
## Membership degrees matrix (top and bottom 5 rows): 
##   Cluster 1 Cluster 2  Cluster 3
## 1 0.2467255 0.7281924 0.02508206
## 2 0.1461973 0.8098839 0.04391880
## 3 0.3019321 0.2170957 0.48097216
## 4 0.4832460 0.4792199 0.03753414
## 5 0.7372493 0.2379269 0.02482383
## ...
##     Cluster 1 Cluster 2  Cluster 3
## 30 0.11375796 0.1036085 0.78263353
## 31 0.15646524 0.8071614 0.03637335
## 32 0.02702304 0.0229784 0.94999856
## 33 0.12893606 0.8220949 0.04896906
## 34 0.22157498 0.6982273 0.08019770
## 
## Descriptive statistics for the membership degrees by clusters
##           Size       Min        Q1      Mean    Median        Q3       Max
## Cluster 1   13 0.3978128 0.4832460 0.7121100 0.8004335 0.8709146 0.9022756
## Cluster 2   18 0.4433379 0.6685871 0.7360856 0.8085227 0.8312259 0.8553647
## Cluster 3    3 0.4809722 0.6318028 0.7378681 0.7826335 0.8663160 0.9499986
## 
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized 
##  0.6156167  0.4234250 
## 
## Within cluster sum of squares by cluster:
##        1        2        3 
## 4289.080 4747.395 1882.548 
## (between_SS / total_SS =  54.16%) 
## 
## Available components: 
##  [1] "u"          "v"          "v0"         "d"          "x"         
##  [6] "cluster"    "csize"      "sumsqrs"    "k"          "m"         
## [11] "iter"       "best.start" "func.val"   "comp.time"  "inpargs"   
## [16] "algorithm"  "call"

3.3.3 Pairwise Scatter Plots

plotcluster(res.fcm, cp=1, trans=TRUE)

set.seed(12333333)
res.fcm2 <- ppclust2(res.fcm, "kmeans")

# Menghitung frekuensi observasi di setiap kluster
cluster_frequencies <- table(res.fcm2[["cluster"]])

# Mengurutkan kluster berdasarkan frekuensinya
sorted_clusters <- names(sort(cluster_frequencies, decreasing = TRUE))

# Membuat urutan kluster yang diinginkan (1, 2, 3, 4, 5)
new_order <- 1:length(sorted_clusters)

# Menukar isi kluster dengan urutan yang dihasilkan
res.fcm2[["cluster"]] <- recode(res.fcm2[["cluster"]], !!!setNames(as.character(new_order), sorted_clusters))

table(res.fcm2[["cluster"]])
## 
##  1  2  3 
## 18 13  3

3.3.4 Cluster Plot with fviz_cluster

fviz_cluster(res.fcm2, data = dtx, 
  ellipse.type = "convex",
  palette = "jco",
  repel = TRUE)

table(res.fcm2[["cluster"]])
## 
##  1  2  3 
## 18 13  3
data.akhir <- cbind(raw.data2[-35,], Cluster = res.fcm2[["cluster"]]) %>% 
  relocate(Cluster, .before = 2)


datatable(data.akhir)
table(data.akhir$Cluster)
## 
##  1  2  3 
## 18 13  3

3.3.5 Profil Setiap Cluster

data.clust <- cbind(dtx, Cluster = res.fcm2[["cluster"]])

# Calculate the mean of each variable for each cluster
cluster_profiles <- aggregate(. ~ Cluster, data.clust, mean)

# Print the cluster profiles
print(cluster_profiles)
##   Cluster     Gada       KG       KH         GM      APB      GPL         TSN
## 1       1 68.34444 2.394444 1.768889 0.17833333 6.526111 3.591111 0.008333333
## 2       2 54.04385 3.956154 3.207692 0.12384615 5.751538 2.320000 0.004615385
## 3       3 31.89667 2.696667 1.370000 0.02666667 9.633333 6.880000 0.000000000
##          GB       BJR       BB        TL
## 1  8.217778 0.9611111 12.75389  6.284444
## 2  5.773077 1.5130769 33.14000  7.151538
## 3 52.383333 2.7866667 23.75333 15.990000
# Convert the data to long format for plotting
cluster_profiles_long <- tidyr::pivot_longer(cluster_profiles, -Cluster, 
                                             names_to = "Variable", values_to = "Value")

# Create the bar plot
ggplot(cluster_profiles_long, aes(x = Cluster, y = Value, fill = Variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(x = "Cluster", y = "Mean Value", fill = "Variable") +
  theme_minimal() +
  ggtitle("Cluster Profiles")

3.3.6 Radar Plot

library(ggiraphExtra)

# Radar Plot
ggRadar(
  data = data.akhir,
  mapping = aes(colours = Cluster),
) + 
theme_light() +
theme(
  text = element_text(size = 10),  # Mengubah ukuran font global
  title = element_text(size = 12),  # Mengubah ukuran font judul
  axis.text = element_text(size = 10),  # Mengubah ukuran font label sumbu
  legend.text = element_text(size = 8)  # Mengubah ukuran font legenda
)

3.3.7 VALIDATION OF THE CLUSTERING RESULTS

res.fcm4 <- ppclust2(res.fcm, "fclust")

# Fuzzy Silhouette Index:
idxsf <- SIL.F(res.fcm4$Xca, res.fcm4$U, alpha=1)
paste("Fuzzy Silhouette Index: ",idxsf)
## [1] "Fuzzy Silhouette Index:  0.552167291736516"
# Partition Entropy:
idxsf <- PE(res.fcm4$U)
paste("Partition Entropy: ",idxsf)
## [1] "Partition Entropy:  0.66950689169937"
# Partition Coefficient:
idxpc <- PC(res.fcm4$U)
paste("Partition Coefficient : ",idxpc)
## [1] "Partition Coefficient :  0.615616663330754"
# Modified Partition Coefficient:
idxmpc <- MPC(res.fcm4$U)
paste("Modified Partition Coefficient :",idxmpc)
## [1] "Modified Partition Coefficient : 0.423424994996131"

3.3.8 gap index

install_load("clusterSim")
cl1<-pam(dtx,4)
cl2<-pam(dtx,5)
clall<-cbind(cl1$clustering,cl2$clustering)
g<-index.Gap(dtx, clall, reference.distribution="unif", B=10,method="pam")

print(g)
## $gap
## [1] 0.9103916
## 
## $diffu
## [1] -0.02054693

3.3.9 Davies-Bouldin’s index

cl2 <- pam(dtx, 5)
print(index.DB(dtx, cl2$clustering, centrotypes="centroids"))
## $DB
## [1] 0.9217699
## 
## $r
## [1] 1.2068431 1.2068431 1.0388518 0.8475622 0.3087490
## 
## $R
##           [,1]      [,2]       [,3]       [,4]       [,5]
## [1,]       Inf 1.2068431 1.03885184 0.84756224 0.19491166
## [2,] 1.2068431       Inf 0.56056288 0.70977887 0.30874896
## [3,] 1.0388518 0.5605629        Inf 0.37658386 0.08746469
## [4,] 0.8475622 0.7097789 0.37658386        Inf 0.08142941
## [5,] 0.1949117 0.3087490 0.08746469 0.08142941        NaN
## 
## $d
##          1        2        3        4        5
## 1  0.00000 28.19528 24.04519 27.79893 83.53662
## 2 28.19528  0.00000 47.17071 35.25616 57.47393
## 3 24.04519 47.17071  0.00000 42.42400 99.43591
## 4 27.79893 35.25616 42.42400  0.00000 89.39108
## 5 83.53662 57.47393 99.43591 89.39108  0.00000
## 
## $S
## [1] 16.282261 17.745016  8.697131  7.279063  0.000000
## 
## $centers
##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6] [,7]      [,8]
## [1,] 60.89062 4.028750 2.794375 0.246875 9.547500 4.231875 0.01  7.655625
## [2,] 46.20000 3.280000 1.228000 0.190000 7.892000 5.422000 0.01 29.964000
## [3,] 80.90833 1.076667 0.665000 0.000000 2.468333 1.230000 0.00  4.418333
## [4,] 53.60167 1.986667 3.596667 0.000000 1.685000 1.310000 0.00  0.780000
## [5,] 16.15000 3.380000 1.230000 0.000000 4.000000 5.380000 0.00 76.620000
##           [,9]     [,10]     [,11]
## [1,] 1.4118750 17.391875  8.657500
## [2,] 2.9000000 25.846000  9.454000
## [3,] 0.5900000  9.368333  3.226667
## [4,] 0.6033333 41.451667  4.460000
## [5,] 1.0800000 19.230000 22.150000

3.3.10 Calinski-Harabasz pseudo F-statistic

c<- pam(dtx,10)
index.G1(dtx, c$clustering)
## [1] 17.93938

3.4 K-Means

df <- scale(dtx)
set.seed(112233)
km <- kmeans(df, 3, nstart = 25)
p <- fviz_cluster(km, data = dtx, repel=TRUE,
             ellipse.type = "convex") # save to access $data

# save '$data'
dt <- p$data # this is all you need

# Menghitung frekuensi observasi di setiap kluster
cluster_frequencies <- table(dt$cluster)

# Mengurutkan kluster berdasarkan frekuensinya
sorted_clusters <- names(sort(cluster_frequencies, decreasing = TRUE))

# Membuat urutan kluster yang diinginkan (1, 2, 3, 4, 5)
new_order <- 1:length(sorted_clusters)

# Menukar isi kluster dengan urutan yang dihasilkan
dt$cluster <- recode(dt$cluster, !!!setNames(as.character(new_order), sorted_clusters))

table(dt$cluster)
## 
##  2  3  1 
## 10  3 21
# calculate the convex hull using chull(), for each cluster
hull_data <- dt %>%
  group_by(cluster) %>%
  slice(chull(x, y))


# plot: you can now customize this by using ggplot sintax
ggplot(dt, aes(x, y, colour = cluster)) + geom_point() +
  geom_polygon(data = hull_data, alpha = 0.2, aes(fill=cluster)) 

table(dt$cluster)
## 
##  2  3  1 
## 10  3 21

3.4.1 Penerapan K-means 3 cluster

km.res <- kmeans(dtx, centers = 3)

# Print the clustering results
print(km.res)
## K-means clustering with 3 clusters of sizes 17, 2, 15
## 
## Cluster means:
##       Gada       KG       KH        GM      APB      GPL         TSN        GB
## 1 53.47353 3.800588 2.146471 0.1858824 7.862353 2.616471 0.006470588 10.297059
## 2 25.71500 2.190000 1.280000 0.0400000 3.750000 7.860000 0.000000000 62.870000
## 3 71.19867 2.242000 2.573333 0.1106667 5.332000 3.682667 0.006666667  5.288667
##         BJR       BB        TL
## 1 1.7823529 29.25353  8.856471
## 2 1.3750000 20.87500 14.410000
## 3 0.8186667 12.83933  4.978667
## 
## Clustering vector:
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  3  3  1  3  1  3  3  3  3  3  1  1  3  1  1  1  3  3  1  1  1  1  1  1  1  1 
## 27 28 29 30 31 32 33 34 
##  1  3  1  2  3  2  3  3 
## 
## Within cluster sum of squares by cluster:
## [1] 6830.540  701.771 3280.799
##  (between_SS / total_SS =  55.3 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

3.4.2 Profilling Kluster

data.clust3 <- cbind(dtx, Cluster = dt$cluster)

# Calculate the mean of each variable for each cluster
cluster_profiles3 <- aggregate(. ~ Cluster, data.clust3, mean)

# Print the cluster profiles
print(cluster_profiles3)
##   Cluster     Gada       KG       KH        GM       APB      GPL         TSN
## 1       2 46.43500 6.009000 2.858000 0.2370000 11.543000 6.261000 0.000000000
## 2       3 58.69667 2.956667 0.690000 0.7366667  6.830000 1.600000 0.060000000
## 3       1 66.09619 1.602857 2.238095 0.0152381  4.058095 2.287143 0.001428571
##         GB       BJR       BB        TL
## 1 23.55900 2.5790000 21.39300 12.651000
## 2 18.24333 1.5400000 17.59333  6.693333
## 3  4.27619 0.7104762 22.14000  5.117619
# Convert the data to long format for plotting
cluster_profiles_long3 <- tidyr::pivot_longer(cluster_profiles3, -Cluster, 
                                             names_to = "Variable", values_to = "Value")

# Create the bar plot
ggplot(cluster_profiles_long3, aes(x = Cluster, y = Value, fill = Variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(x = "Cluster", y = "Mean Value", fill = "Variable") +
  theme_minimal() +
  ggtitle("Cluster Profiles")

data.akhir3 <- cbind(raw.data2[-35,], Cluster =  dt$cluster) %>% 
  relocate(Cluster, .before = 2)

# Radar Plot
ggRadar(
  data = data.akhir3,
  mapping = aes(colours = Cluster),
) + 
theme_light() +
theme(
  text = element_text(size = 10),  # Mengubah ukuran font global
  title = element_text(size = 12),  # Mengubah ukuran font judul
  axis.text = element_text(size = 10),  # Mengubah ukuran font label sumbu
  legend.text = element_text(size = 8)  # Mengubah ukuran font legenda
)